XML Rules for Enclitic Segmentation
نویسندگان
چکیده
Sentence word segmentation is an important task in robust part-of-speech (POS) tagging systems. In some cases this is relatively simple, since each textual word (or token) corresponds to one linguistic component. However, there are many others where segmentation can be very hard, such as those of contractions, verbal forms with enclitic pronouns, etc., where the same token contains information about two or more linguistic components. There are two main approaches to solving these difficult cases:
منابع مشابه
A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling
In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...
متن کاملA Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling
In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...
متن کاملModified CLPSO-based fuzzy classification System: Color Image Segmentation
Fuzzy segmentation is an effective way of segmenting out objects in images containing both random noise and varying illumination. In this paper, a modified method based on the Comprehensive Learning Particle Swarm Optimization (CLPSO) is proposed for pixel classification in HSI color space by selecting a fuzzy classification system with minimum number of fuzzy rules and minimum number of incorr...
متن کاملWord segmentation in Persian continuous speech using F0 contour
Word segmentation in continuous speech is a complex cognitive process. Previous research on spoken word segmentation has revealed that in fixed-stress languages, listeners use acoustic cues to stress to de-segment speech into words. It has been further assumed that stress in non-final or non-initial position hinders the demarcative function of this prosodic factor. In Persian, stress is retract...
متن کاملAutomated Tumor Segmentation Based on Hidden Markov Classifier using Singular Value Decomposition Feature Extraction in Brain MR images
ntroduction: Diagnosing brain tumor is not always easy for doctors, and existence of an assistant that facilitates the interpretation process is an asset in the clinic. Computer vision techniques are devised to aid the clinic in detecting tumors based on a database of tumor c...
متن کامل